Recently, Schaerer et al. (2023) reported that the gender discrimination of hiring decisions has been eliminated and even reversed. They performed a meta-analysis of 44 years, and recruited a “red team” (Lakens, 2020) to critique their work. I applaud for their rigorous work and the pursuit of best practices in science. However, as they mentioned, the heterogeneity of the true effects was extremely large (I2-statistic = 82.8%), implying that 82.8% of the total variance can be attributed to heterogeneity. The odd rate of discrimination, which is the ratio of the hiring rate of males to that of females, was 0.91 (95% confidence interval ranged from 0.86 to 0.97, z = -3.00, p =.003). But the 95% prediction interval is as wide as 0.49 to 1.70, which means that the true effect in a new study is expected to fall within this range with 95% probability. That is obvious from their Figure 4, which is pasted below.
According to their data, the samples are biased towards the United States and Western Europe. They performed leave-one-out analysis, and examined the potential country-level confounders, like country’s gender inequality, country’s education level, GDP per capita, Human Development Index, and even culture (Muthukrishna et al., 2020). However, I still wonder if there are some country-level confounders that they did not consider. Therefore, I would like to point out each country’s data to see if there are some country-level systematic biases.
Thanks for their open science practice, I can access their data and code from here.
## Load packages
library(tidyverse)
library(cowplot)
## Load data
dat <- readxl::read_excel("Data.xlsx", sheet = "DATA")
We will first calculate the mean odd rate of different countries or regions. Then we will highlight the countries or regions with the highest and lowest odd rates.
# Odd rate
dat <- dat %>%
mutate(OR = MaleSucc * FemaleFail / (MaleFail * FemaleSucc),
N = MaleN + FemaleN)
# Mean odd rate by country
dat_country <- dat %>%
group_by(DataCounty) %>%
summarise(mean_OR = mean(OR, na.rm = TRUE),
N = sum(N, na.rm = TRUE)) %>%
arrange(mean_OR)
# fig_country_mean
fig_country_mean <- dat_country %>%
ggplot(aes(x = reorder(DataCounty, mean_OR), y = mean_OR, fill = N)) +
geom_bar(stat = "identity") +
scale_fill_gradient(low = "red", high = "blue") +
labs(x = "Country", y = "Mean odd rate") +
geom_hline(yintercept = 1, linetype = "dashed", color = "black") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
fig_country_mean
One can see that the mean odd rate of different countries or regions varies a lot. The countries with the highest odd rates are Cyprus, Peru, and the United Kingdom, while the countries with the lowest odd rates are the Brazil, the Jamaica, and the Finland. In addition, the United States and China are in the middle with most and second most samples, respectively.
colorset <- dat_country %>%
select(DataCounty)
# highlight cyprus
colorset$highlights <- "Others"
dat_cyprus <- left_join(dat, colorset, by = "DataCounty")
dat_cyprus$highlights[dat_cyprus$DataCounty == "Cyprus"] <- "Cyprus"
# visualize cyprus
fig_cyprus <- dat_cyprus %>%
ggplot(aes(x = ApplicYearMost, y = OR, color = highlights, size = N)) +
geom_point(alpha = 0.5) +
geom_hline(yintercept = 1, linetype = "dashed", color = "black") +
theme_cowplot() +
labs(title = "Highlight Cyprus", x = "Year", y = "Odd rate")
fig_cyprus
# highlight peru
dat_peru <- left_join(dat, colorset, by = "DataCounty")
dat_peru$highlights[dat_peru$DataCounty == "Peru"] <- "Peru"
# visualize peru
fig_peru <- dat_peru %>%
ggplot(aes(x = ApplicYearMost, y = OR, color = highlights, size = N)) +
geom_point(alpha = 0.5) +
geom_hline(yintercept = 1, linetype = "dashed", color = "black") +
theme_cowplot() +
labs(title = "Highlight Peru", x = "Year", y = "Odd rate")
fig_peru
# highlight UK
dat_uk <- left_join(dat, colorset, by = "DataCounty")
dat_uk$highlights[dat_uk$DataCounty == "United Kingdom"] <- "United Kingdom"
# visualize UK
fig_uk <- dat_uk %>%
ggplot(aes(x = ApplicYearMost, y = OR, color = highlights, size = N)) +
geom_point(alpha = 0.5) +
geom_hline(yintercept = 1, linetype = "dashed", color = "black") +
theme_cowplot() +
labs(title = "Highlight UK", x = "Year", y = "Odd rate")
fig_uk
# highlight US
dat_us <- left_join(dat, colorset, by = "DataCounty")
dat_us$highlights[dat_us$DataCounty == "United States"] <- "United States"
# visualize US
fig_us <- dat_us %>%
ggplot(aes(x = ApplicYearMost, y = OR, color = highlights, size = N)) +
geom_point(alpha = 0.5) +
geom_hline(yintercept = 1, linetype = "dashed", color = "black") +
theme_cowplot() +
labs(title = "Highlight US", x = "Year", y = "Odd rate")
fig_us
# highlight CN
dat_cn <- left_join(dat, colorset, by = "DataCounty")
dat_cn$highlights[dat_cn$DataCounty == "China"] <- "China"
# visualize CN
fig_cn <- dat_cn %>%
ggplot(aes(x = ApplicYearMost, y = OR, color = highlights, size = N)) +
geom_point(alpha = 0.5) +
geom_hline(yintercept = 1, linetype = "dashed", color = "black") +
theme_cowplot() +
labs(title = "Highlight China", x = "Year", y = "Odd rate")
fig_cn
# highlight BR
dat_br <- left_join(dat, colorset, by = "DataCounty")
dat_br$highlights[dat_br$DataCounty == "Brazil"] <- "Brazil"
# visualize BR
fig_br <- dat_br %>%
ggplot(aes(x = ApplicYearMost, y = OR, color = highlights, size = N)) +
geom_point(alpha = 0.5) +
geom_hline(yintercept = 1, linetype = "dashed", color = "black") +
theme_cowplot() +
labs(title = "Highlight Brazil", x = "Year", y = "Odd rate")
fig_br
# highlight JM
dat_jm <- left_join(dat, colorset, by = "DataCounty")
dat_jm$highlights[dat_jm$DataCounty == "Jamaica"] <- "Jamaica"
# visualize JM
fig_jm <- dat_jm %>%
ggplot(aes(x = ApplicYearMost, y = OR, color = highlights, size = N)) +
geom_point(alpha = 0.5) +
geom_hline(yintercept = 1, linetype = "dashed", color = "black") +
theme_cowplot() +
labs(title = "Highlight Jamaica", x = "Year", y = "Odd rate")
fig_jm
# highlight FI
dat_fi <- left_join(dat, colorset, by = "DataCounty")
dat_fi$highlights[dat_fi$DataCounty == "Finland"] <- "Finland"
# visualize FI
fig_fi <- dat_fi %>%
ggplot(aes(x = ApplicYearMost, y = OR, color = highlights, size = N)) +
geom_point(alpha = 0.5) +
geom_hline(yintercept = 1, linetype = "dashed", color = "black") +
theme_cowplot() +
labs(title = "Highlight Finland", x = "Year", y = "Odd rate")
fig_fi
to be continued…